-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the parameter-shift hessian to the Autograd interface #1131
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1131 +/- ##
=======================================
Coverage 98.12% 98.12%
=======================================
Files 144 144
Lines 10813 10833 +20
=======================================
+ Hits 10610 10630 +20
Misses 203 203
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matrix caching scares me.
def gradient_product(g): | ||
# In autograd, the forward pass is always performed prior to the backwards | ||
# pass, so we do not need to re-unwrap the parameters. | ||
saved_grad_matrices = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of a global storing solution. This is going to kill memory usage if they users runs a ton of experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The idea here is to simply cache the hessian/jacobian for the same parameter value, if vjp
is called repeatedly by Autograd. So repeated calls by the user via qml.jacobian
won't do any caching, but a single call by the user to qml.jacobian
will cache under the hood for autograd.
This is basically a workaround Python not having first class dynamic programming (which would be very useful here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding memory usage: I think this should be safe, since this isn't global storage, instead it is locally scoped to within the vjp function - once the call to vjp()
is complete, CPython should delete the saved_grad_matrices
data.
|
||
@autograd.extend.primitive | ||
def jacobian(p): | ||
jacobian = _evaluate_grad_matrix(p, "jacobian") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you storing the results in a dictionary with string keys? Especially if you only have two keys.
Seems... dangerous lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I entirely follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to modify a variable used in closure, it needs to be a mutable type, so either dictionary, set, or list. If that's what you are asking about.
I was confused about that and had to do some background reading about closure to figure it out. Maybe there should be a comment about that?
self.set_parameters(self._all_parameter_values, trainable_only=False) | ||
|
||
# only flatten g if all parameters are single values | ||
saved_grad_matrices[grad_matrix_fn] = grad_matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is saved_grad_matrices
ever cleared? How does it behave between runs? Could there be unintended state poising between calls to this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me write some tests for this to make sure it works as I expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is saved_grad_matrices ever cleared?
I believe it is cleared by nature of CPython deleting local variables once they go out of scope. That is once the call to vjp()
is complete, saved_grad_matrices
will be deleted.
I'm not 100% sure though, since it is used in locally defined functions that are returned...
Is there a way of testing/verifying this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are correct that is what will happen. That dictionary is in the namesapce of the vjp
function and not the class which I originally thought. This should be fine the.
Benchmarking courtesy of @mariaschuld:
and memory usage:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @josh146! Seems like a nice addition. Although, I'm a bit confused about exactly how autograd works here, so it would be nice to have a brief chat about it sometime.
def gradient_product(g): | ||
# In autograd, the forward pass is always performed prior to the backwards | ||
# pass, so we do not need to re-unwrap the parameters. | ||
saved_grad_matrices = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, and not necessarily better than the current solution; but could there be a case here for instead saving them as saved_hessian
and saved_jacobian
instead of storing them in a single dict? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see the advantage over a dictionary - in Python, a lot of caching is done with dictionaries, including class properties, memoization, etc.
def test_grad_matrix_caching(self, mocker, dev_name, diff_method): | ||
"""Test the gradient matrix caching""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this testing the caching? It seems like it's only testing that it doesn't cache when calling qml.jacobian
or qml.hessian
several times (which is good, although not what I would expect from the naming and docstring for this test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - this is still WIP, I added it after @Thenerdstation's comments. I'm still not fully sure how to test this, it might require usage of https://docs.python.org/3/library/gc.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a bit more of an exploration here: #1131 (comment)
def vhp(ans, p): | ||
def hessian_product(ddy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to get some more details on what's happening here. I'm not fully sure what's going on tbh (mainly autograd-wise).
Co-authored-by: Theodor <theodor@xanadu.ai>
@Thenerdstation I've explored the issue of the cache further, just putting my exploration here for now. The def grad(fun, x):
vjp, ans = _make_vjp(fun, x)
grad_value = vjp(vspace(ans).ones())
return grad_value, ans where
By using Python's def grad(fun, x):
vjp, ans = _make_vjp(fun, x)
################################################################
# The returned vjp function contains the saved matrix cache via closure.
# Let's extract this.
saved_matrices = []
import autograd
# extract the end node from the VJP closure
end_node = vjp.__closure__[0].cell_contents
outgrads = {end_node: (vspace(ans).ones(), False)}
# iterate through all nodes in the computational graph
for node in autograd.tracer.toposort(end_node):
outgrad = outgrads.pop(node)
ingrads = node.vjp(outgrad[0])
try:
# look deep inside the nested function closure, and attempt to
# extract the saved matrices dictionary cache if the node is a QNode
qnode_cache = (
node.vjp
.__closure__[0].cell_contents
.__closure__[0].cell_contents
.__closure__[1].cell_contents
)
if isinstance(qnode_cache, dict) and "jacobian" in qnode_cache:
saved_matrices.append(qnode_cache)
except:
# the node was not a QNode; pass.
pass
for parent, ingrad in zip(node.parents, ingrads):
outgrads[parent] = autograd.core.add_outgrads(outgrads.get(parent), ingrad)
print(saved_matrices)
################################################################
grad_value = vjp(vspace(ans).ones())
return grad_value, ans # after returning, vjp is now out of scope So, inside the As far as I can tell, however, as soon as the |
pennylane/interfaces/autograd.py
Outdated
if dy.size > 1: | ||
if all(np.ndim(p) == 0 for p in params): | ||
# only flatten dy if all parameters are single values | ||
vhp = dy.flatten() @ ddy @ hessian @ dy.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we have dy
twice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can fully answer that, but it makes sense if you consider the shape of the hessian and perform dimensional analysis:
- the hessian will be of shape
(num_params, num_params, num_output)
, dy
(think of it like the gradient) has shape(num_output,)
, andddy
(think of it like the gradient of the gradient) has shape(num_output, num_params)
.
So performing the matrix multiplication will return a vhp of the right size, (num_params,)
.
E.g.:
>>> num_params = 2
>>> num_output = 3
>>> dy = np.random.random([num_output])
>>> ddy = np.random.random([num_output, num_params])
>>> hessian = np.random.random([num_params, num_params, num_output])
>>> dy @ ddy @ hessian @ dy
array([1.82038363, 2.03458631])
|
||
dev = qml.device(dev_name, wires=1) | ||
|
||
@qnode(dev, diff_method=diff_method, interface="autograd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should create an "analytic circuit" set of fixtures for the testing suite and be able to easily reuse the quantum function, result, derivative, and hessian across the different interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point 🤔 many more tests like this exist across the interface test suite, so maybe something worth considering in a new PR/issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of things to change:
- comment clarifying closure and
saved_grad_matrices
- redefining p in a comprehension
Optional suggestion:
- analytic circuits test fixture file
There are still things I don't completely understand about this, but it works and that is the most important thing :)
Thanks @albi3ro! Your comment about the parameter flattening if statement really helped make the PR better. |
Co-authored-by: Theodor <theodor@xanadu.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to have this! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Theodor <theodor@xanadu.ai>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found something to fix before getting merged in. Could definitely cause bugs down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating from "request changes" to "approve" since the change isn't actually needed.
Context:
The Autograd custom gradient interface currently returns a non-differentiable gradient.
Description of the Change:
Modifies the Autograd interface so that the Jacobian also defines a custom gradient --- the vector-Hessian product --- where the Hessian is computed by an efficient implementation of the parameter-shift rule.
Autograd QNodes can now be doubly differentiated, on both simulators and hardware:
In addition, a slight tweak was made to the Jacobian/Hessian evaluation logic in order to avoid redundant Jacobian/Hessian computations.
Benefits:
Autograd QNodes now support double derivatives on hardware and software
The Jacobian/Hessian is now only computed once for given input parameters, and re-used for multiple VJPs/VHPs.
Possible Drawbacks:
3rd derivatives and higher are not supported. We could potentially support higher derivatives using recursion.
Currently, Jacobian parameter-shifts are not being re-used for the Hessian parameter-shifts. This requires more thinking.
I realised while writing this PR that the parameter-shift Hessian logic is based on the formula given here: https://arxiv.org/abs/2008.06517. However, this formula assumes all gates support the 2-term parameter-shift; this is not the case of the controlled rotation. Therefore, computing the Hessian of the controlled rotations will give the incorrect result!
Long term, we should consider implementing parameter-shift logic as follows:
vjp
andvhp
directly. This avoids redundant parameter-shift evaluations.Related GitHub Issues: n/a